This page last changed on Aug 23, 2006 by [email protected].

How to load tiger data

(This version is the basic idea for 1 county. Its extendable to more.
Special care has to be given for topology (and other) error.
You should be also checking # of entities loaded as well)

1. get ogr2ogr and gdal (for tigerpoly.pl)

http://fwtools.maptools.org/ is probably the easiest place

2. Unzip and convert to PostGIS

NOTE: compressed (.zip) tiger is approximately 5.5Gb. Uncompressed its about 45Gb.

If all the tiger .zip files are in one directory (bash):

for f in *.zip
do
echo $f
mkdir ${f%.*}
mv $f ${f%.*}
cd ${f%.*}
unzip -q $f
cd ..
done

mv */*.zip ~/original_tiger2005fe

At the end of this you will have about 3000 directories, named like "tgr01033". Inside this directory will be the uncompressed tiger files for that one county.

Next you need to setup the database - we do this with any of the tiger counties.

First make sure you have a tiger2005fe postgis database

mv tgr01001  TIGERFIRSTMODULE
cd TIGERFIRSTMODULE
ogr2ogr -f "PostgreSQL" PG:"dbname=tiger2005fe user=postgres host=localhost" .

Next, load in each county:

for f in tgr*
do
echo $f
cd $f
ogr2ogr -update -append -f "PostgreSQL" PG:"dbname=tiger2005fe user=postgres host=localhost" .
cd ..
done

NOTE: this will not re-do tgr01001 ("TIGERFIRSTMODULE").
HINT: you can put this is a script and use "nohup" if you're running the program from a remote terminal (ignore this if you dont understand).

3. Create indexes in the database

create index indx_mod_completechain on completechain (module);
create index indx_mod_landmarks on landmarks (module);
create index indx_mod_altname on altname (module);
create index indx_mod_arealandmarks on arealandmarks (module);
create index indx_mod_entitynames on entitynames (module);
create index indx_mod_featureids on featureids (module);
create index indx_mod_idhistory on idhistory (module);
create index indx_mod_polychainlink on polychainlink (module);
create index indx_mod_pip on pip(module);
create index indx_mod_polygoncorrections on polygoncorrections (module);
create index indx_mod_tlidrange on tlidrange (module);
create index indx_mod_zerocellid on zerocellid (module);
create index indx_mod_zipcodes on zipcodes (module);
create index indx_mod_zipplus4 on zipplus4(module);
create index indx_mod_spatialmetadata on spatialmetadata (module);

vacuum analyse;

4. QA/QC check:

a) process the counts.txt files shipped with TIGER:

Use the counts_to_sql.pl script to generate SQL:

cat ../www2.census.gov/geo/tiger/tiger2005fe/*/counts*.txt > all_counts.txt
 perl  all_counts.pl <all_counts.txt >out.sql
 psql -U postgres tiger2005fe
    \i out.sql
    -- there should be 3232
    select count(distinct county) from counts_txt;

This gives a list of count of each feature type for each country. We're going to compare this to the actual number we loaded. If they are the same, our load probably worked!

b) count the actual number of features loaded.

create table qa_qc (module text,  filetype char, count int);

insert into qa_qc
 select module,'1',count(*) from completechain group by module;

insert into qa_qc
 select module,'7',count(*) from landmarks group by module;

insert into qa_qc
 select module,'4',count(*) from altname group by module;

insert into qa_qc
 select module,'8',count(*) from arealandmarks group by module;

insert into qa_qc
 select module'c',count(*) from entitynames group by module;

insert into qa_qc
 select module,'5',count(*) from featureids group by module;

insert into qa_qc
 select module,'h',count(*) from idhistory group by module;

insert into qa_qc
 select module,'p',count(*) from pip group by module;

insert into qa_qc
 select module,'i',count(*) from polychainlink group by module;

insert into qa_qc
 select module,'b',count(*) from polygoncorrections group by module;

insert into qa_qc
 select module,'r',count(*) from tlidrange group by module;

insert into qa_qc
 select module,'t',count(*) from zerocellid group by module;

insert into qa_qc
 select module,'6',count(*) from zipcodes group by module;

insert into qa_qc
 select module,'z',count(*) from zipplus4 group by module;

insert into qa_qc
 select module,'m',count(*) from spatialmetadata group by module;

c) compare the actual counts to the expected counts;

-------------------- check --------------

-- we didnt count up featuretypes '2','a','b','c','s'
-- and there will not be records in qa_qc where nfeats ==0
---  THIS SHOULD RETURN no rows
SELECT county,type,nfeats FROM counts_txt WHERE not(type in ('2','a','b','c','s') ) AND nfeats=!0
 EXCEPT
SELECT module,filetype,count FROM qa_qc ;
-- should return none! (if it does return something, then there was a load error)

-- need to do the "opposite" too
SELECT module,filetype,count FROM qa_qc
EXCEPT
SELECT county,type,nfeats FROM counts_txt WHERE not(type in ('2','a','c','s') );
- should return none! (if it does return something, then there was a load error)

-- should be 3232 (see above)
 SELECT count(distinct(module)) from qa_qc;
 SELECT count(distinct(county)) from counts_txt;
-------------------- check --------------
-- should be 0 (checks for null geometry and geometries that cross each other)
-- this takes forever!
select module,tlid from completechain where (wkb_geometry isnull) or not(issimple(wkb_geometry));

--should be 0 (no null pips)
select count(*) from pip where (wkb_geometry isnull);
Problem

In the above, you may find a problem with module='TGR66010' and tlid = 204575992.
This is Guam, and is a simple data error. You can correct it with:
BEGIN;
UPDATE completechain
set wkb_geometry = '01020000200100000005000000BBB9F8DB9E186240B4E6C75F5ACC2A404FCFBBB1A01862409DD5027B4CCC2A40E2E47E87A218624084D72E6D38CC2A402A91442FA31862408A5A9A5B21CC2A402A91442FA31862400D1CD0D215CC2A40'::geometry
WHERE module='TGR66010' and tlid = 204575992;
- should only update one row
COMMIT;

Here is a list of completechain feature codes (cfcc) and the number of features in each class:

 cfcc |  count
------+----------
 A11  |    86549
 A12  |      117
 A13  |     6399
 A14  |       67
 A15  |   191106
 A16  |      168
 A17  |     8222
 A18  |      438
 A19  |      707
 A21  |   780816
 A22  |      185
 A23  |     1959
 A24  |       36
 A25  |   173281
 A26  |       39
 A27  |      769
 A28  |      103
 A29  |     1255
 A31  |  1592487
 A32  |      168
 A33  |     2188
 A34  |       29
 A35  |    97673
 A36  |       10
 A37  |      291
 A38  |      175
 A39  |     2087
 A41  | 29571526
 A42  |     1180
 A43  |    13096
 A44  |     1051
 A45  |    83586
 A46  |       37
 A47  |      306
 A48  |      427
 A49  |     7704
 A51  |   520682
 A52  |      130
 A53  |       59
 A63  |   206622
 A64  |    42798
 A65  |      489
 A70  |     1087
 A71  |    31441
 A72  |      311
 A73  |    30260
 A74  |  2087931
 B11  |   655592
 B12  |     1255
 B13  |    10456
 B15  |       34
 B16  |       21
 B19  |       64
 B21  |   104685
 B22  |       37
 B23  |     1409
 B31  |     1958
 B32  |        4
 B33  |       11
 B40  |        3
 B50  |    12564
 B51  |      329
 B52  |       65
 C00  |       32
 C10  |    26579
 C20  |   498637
 C30  |     4265
 C31  |     1342
 C32  |      597
 D51  |    25938
 D54  |       23
 E00  |       92
 E10  |     6805
 E20  |     3544
 E21  |    14630
 E23  |        6
 E24  |      621
 E27  |        7
 F10  |  2089554
 F11  |    70372
 F12  |    38049
 F13  |   107965
 F14  |      221
 F15  |      301
 F16  |    56576
 F17  |    29752
 F18  |    11692
 F19  |      281
 F20  |      963
 F21  |   275890
 F22  |    80232
 F23  |   455150
 F24  |      977
 F25  |        3
 F30  |    11702
 F40  |   242448
 F52  |        1
 F70  |       18
 F71  |      504
 F72  |      240
 F73  |     2077
 F74  |       14
 F80  |        3
 F81  |   456602
 F83  |   833757
 F84  |   129215
 F85  |   259570
 F86  |     1537
 F87  |     4235
 F88  |      160
 H01  |  2480354
 H02  |    79731
 H10  |        3
 H11  |  4151361
 H12  |  3276916
 H13  |    10452
 H20  |        1
 H21  |   274493
 H22  |   167706
 H70  |    90150
 H71  |    18140
 H72  |       23
 H74  |    12026
 H75  |     3409
 H76  |   314006
 H77  |    15467
 H80  |        5
 P11  |      327
 P12  |        2
 P13  |       10
 P15  |     1728
 P16  |       14
 P17  |        7
 P19  |       19
 P21  |     1379
 P22  |        1
 P23  |       10
 P25  |     1292
 P27  |       18
 P28  |        3
 P29  |       10
 P31  |     1957
 P35  |     1344
 P37  |        1
 P39  |       15
 P41  |   239049
 P42  |        9
 P43  |       23
 P44  |        2
 P45  |      544
 P49  |       53
 P51  |     1061
 P63  |     2258
 P64  |      277
 P65  |       10
 P70  |        1
 P71  |      331
 P73  |      284
 P74  |     9024
(159 rows)

From the TIGER manual:

2005 First Edition TIGER/Line File Record Types

Record Type 1---Complete Chain Basic Data Record

Record Type 1 provides a single record for each unique complete chain
in the TIGER/Line files. The basic data record contains the end nodes
for the complete chain. This record also contains address ranges and
ZIP Codes (for most areas of the country where a street name/house
numbering system existed at the time of data extraction from the Census
TIGER database) and the Census 2000 census geographic entity codes for
each side of the complete chain. Additional feature identifier, address
range, and ZIP Code data related to Record Type 1 are found on Record
Types 4, 5, 6, and Z.

Record Type 2---Complete Chain Shape Coordinates

Record Type 2 provides an additional series of latitude and longitude
coordinate values describing the shape of each complete chain in Record
Type 1 that is not a straight line segment. That is, not all complete chains
in Record Type 1 have shape points and therefore not all have an associated
Record Type 2. Where a complete chain in Record Type 1 is not a
straight line, Record Type 2 may have a many-to-one relationship with
Record Type 1.

Record Type 4---Index to Alternate Feature Identifiers

Record Type 4 provides an index to alternate feature names associated
with the complete chain (Record Type 1). A Record Type 4 will not exist
for a Record Type 1 that has only one name. A complete chain can have
more than one alternate name. Record Type 4 has a many-to-one relationship
with Record Type 1 and a many-to-one relationship with Record
Type 5.

Record Type 5---Complete Chain Feature Identifiers

Record Type 5 contains a list of all unique feature names for complete
chains in the TIGER/Line files. Each name (or feature identifier) has an
identification code number (FEAT). Record Type 5 has a one-to-many
relationship with Record Type 4.

Record Type 6---Additional Address Range and ZIP Code Data

Record Type 6 provides additional address range information for a street
complete chain when the information cannot be presented as a single
address range (for example, the house/building numbers are not uniformly
arranged to form an address range). Record Type 6 appears only
for those counties that have address ranges and ZIP Code information in
the Census TIGER database. There is no assurance that the address
ranges provided on Record Type 6 will cover fewer addresses than the
address ranges appearing on Record Type 1. Data users must use Record
Type 6 to obtain the entire picture of the potential address ranges along a
complete chain. The address ranges used for geocoding along corporate
corridors and corporate offset limits appear only in Record Type 6.
Record Type 6 can have a one-to-one or a many-to-one relationship with
Record Types 1 and with Record Type Z.

Record Type 7---Landmark Features

Record Type 7 contains the area and point landmarks from the Census
TIGER database. If Record Type 7 represents an area landmark rather
than a point landmark, then a one-to-one relationship exists with Record
Type 8. Some of the information that was formerly in Record Type 9 now is
included in Record Type 7. If a county file has no landmarks Record Types 7
and 8 will not exist for that county.

Record Type 8---Polygons Linked to Area Landmarks

Record Type 8 links the polygon identification codes with the area landmark
identification codes. If a county file does not have any area landmarks then
there will not be a Record Type 7 or a Record Type 8 for that county. Record
Type 8 can have a one-to-one, one-to many, many-to-one, or many-to-many
relationship with Record Type P.

Record Type A---Polygon Geographic Entity Codes: Current Geography

Record Type A contains a record for each GT-polygon represented by
Record Type P in the TIGER/Line files. The U.S. Census Bureau provides
current (post-Census 2000) geographic entity codes---state, county, county
subdivision, place, American Indian/Alaska Native Area/Hawaiian Home
Land---on this record type.

Record Type B---Polygon Geographic Entity Codes: Corrections

Record Type B contains records for only those GT-polygons with
corrections. The U.S. Census Bureau provides corrected geographic entity
codes from the Count Question Resolution (CQR) Program for state,
county, county subdivision, place, and American Indian/Alaska Native
Area/Hawaiian Home Land areas on this record type. A Record Type B
will not exist in counties that do not contain corrected geography. The
CENID and POLYID fields may be used to link Record Type B to Record
Type S to determine which geographic entity code(s) were corrected.

Record Type C---Geographic Entity Names

Record Type C provides a unique list of all geographic codes, their associated
name, and some entity attributes in a flat (nonhierarchical) file. It
contains a Data Year field that may have five values: 2000 for geographic
names and codes valid for Census 2000, 200n (where 200n is the year of
extraction from the Census TIGER database) for geographic names and
codes valid for the current year, CQR0 for geographic names and codes
valid for corrected geography, EC02 for geographic names and codes
valid for the 2002 Economic Census, or blank when the geographic names
and codes for Census 2000, 200n, and CQR0 are the same. Multiple
records for the same geographic entity show its change or correction over
time. Record Type C is linked to other record types (1, A, B, E, and S)
through geographic entity codes.

Record Type E---Polygon Geographic Entity Codes: Economic Census

Record Type E provides the basic geographic entity codes---state, county, and
place---used in the Economic Census. Record Type E is linked to Record Type
C through geographic entity codes.

Record Type H---TIGER/Line ID History

Record Type H provides the history of each TIGER/Line ID when complete
chains (Record Type 1) are split or merged. Record Type H shows the TLIDs
of the complete chains in existence after the split or prior to the merge.

Record Type I---Link Between Complete Chains and Polygons

Record Type I links Record Type 1, the complete chain basic data, to
Record Type P, the GT-polygon internal point. The Record Type I to
Record Type 1 link (TLID) may be used to link complete chain attributes
and other data record types (2, 4, 6, H, U, and Z) to each other. The Record
Type I to Record Type P link (CENID and POLYID) may be used to link
polygon attributes and other data record types (8, A, B, E, and S) to each
other. Record Type I also serves as a link between the complete chain
attributes on Record Type 1 and the TIGER Zero-Cell ID (TZID) information
on Record Types T and U. Record Type I has a one-to-one relationship
with Record Types 1, T, and U but a many-to-one relationship with
Record Type P. When Record Type I is linked to a single-sided Record
Type 1 (county boundary), it will provide only the left- or the right-
GT-polygon identifier.

Record Type M---Feature Spatial Metadata Record

Record Type M provides spatial metadata for each feature in a TIGER/Line
file, identifying the source for the spatial coordinates. Record Type M can
have a one-to-one or many-to-one relationship with Record Type 1.
1-14 2005 First Edition TIGER/Line® Files

Record Type P---Polygon Internal Point

There is a Record Type P for every GT-polygon in the TIGER/Line files.
Record Type P has a one-to-one relationship with Record types A and S
and a one-to-many relationship with Record Type I and identifies the
internal point coordinates for each GT-polygon. See the Internal Points
section in Chapter 3.
The TIGER/Line files include all complete chains and GT-polygons in the
Census TIGER database. The topology of the Census TIGER database
ensures that a one-to-one relationship exists between the GT-polygons constructed
from Record Types 1 and 2 and Record Type P.

Record Type R---TIGER/Line ID Record Number Range

Record Type R contains the range of unique complete chain record
numbers (TLIDs) and TIGER Zero-Cell IDs (TZIDs) assigned to a census
file in a nationwide scheme. Record Type R has the lowest and the highest
record numbers for the range. Numbers are assigned to complete chains
or zero-cells beginning at the lowest value. The current number is the
highest record number for the census file used.
Each TIGER/Line file consists of an entire county or statistically equivalent
entity. In the Census TIGER database, the county or statistically
equivalent entity may be split into many partitions. The U.S. Census
Bureau assigns permanent record numbers to each of these partitions.
These record numbers are found in Record Type R. Record Type R is not
directly linked to any other record type.

Record Type S---Polygon Geographic Entity Codes: Census 2000

Record Type S contains a record for each GT-polygon represented by
Record Type P in the TIGER/Line files. The U.S. Census Bureau provides
the Census 2000 geographic entity codes that identify GT-polygons on this
record type.

Record Type T---TIGER Zero-Cell ID

There is a Record Type T for every TIGER Zero-Cell ID (TZID) in the
TIGER/Line files. Record Type T has a one-to-many relationship with
Record Type I.

Record Type U---TIGER/Line ID Overpass/Underpass Identification

Record Type U provides information identifying those complete chains
that are overpasses or underpasses and the order in which they cross
other complete chains. Record Type U has a one-to-one relationship with
Record Type T, a many-to-one relationship with Record Type 1, and a
one-to-many, many-to-one, or many-to-many relationship with Record
Type I.

Record Type Z---ZIP+4® Codes

Record Type Z provides Postal +4 Add-On codes that make ZIP+4 codes
out of the ZIP Codes on Type 1 and Type 6 records. Record Type Z has a
one-to-one or many-to-one relationship with Record Type 1 and with
Record Type 6.


all_counts.pl (application/octet-stream)
basicdefs2.gif (image/gif)
basicdefs.gif (image/gif)

There are a couple of typos above to be aware of in case you're cutting and pasting (as i did). 
First, the SQL stmt to count the number of entitynames per module is missing a comma after module.  Should be...

insert into qa_qc
select module,'c',count from entitynames group by module;

Second, the first line of SQL in section 4c has incorrect syntax (remove the !). The correct statement is...

SELECT county,type,nfeats FROM counts_txt WHERE not(type in ('2','a','b','c','s') ) AND nfeats=0

Also, I personally had trouble getting the data from puerto rico and guam loaded at all on linux. ogr2ogr failed with an invalid UTF-8 byte error. If anyone has gotten this to work correctly please post what you did.

There were also several counties for which the data from the gov's website contained no zip code data. I re-downloaded the zip files again tonight to double-check file size then loaded one into a fresh PostGIS install and still no zip code data. There must be something wrong with the files available for download. Here are the modules I'm referring to...

tiger=# SELECT county,type,nfeats FROM counts_txt WHERE not(type in ('2','a','b','c','s') ) AND nfeats=0
tiger-# EXCEPT
tiger-# SELECT module,filetype,count FROM qa_qc ;
county | type | nfeats
--------------------
TGR13307 | 6 | 0
TGR15005 | 4 | 0
TGR15005 | 6 | 0
TGR17069 | 6 | 0
TGR17151 | 6 | 0
TGR21039 | 6 | 0
TGR30069 | 6 | 0
TGR31009 | 6 | 0
TGR31103 | 6 | 0
TGR31117 | 6 | 0
TGR31183 | 6 | 0
TGR35021 | 6 | 0
TGR46017 | 6 | 0
TGR46095 | 6 | 0
TGR46113 | 6 | 0
TGR46137 | 6 | 0
TGR47175 | 6 | 0
TGR48261 | 4 | 0
TGR48301 | 6 | 0
TGR51021 | 6 | 0
TGR51029 | 6 | 0
TGR51091 | 6 | 0
TGR51115 | 6 | 0
TGR51540 | 8 | 0
TGR51685 | 8 | 0
(25 rows)

Posted by [email protected] at Mar 25, 2006 09:44

You probably have the encoding set to UNICODE, set it to SQL_ASCII. I think on unix you get SQL_ASCII as the default, but the postgis installer on Windows sets it as UNICODE by default.

I think the answer is to manually install postgis on windows:
a. create a database – there's options to set the encoding
b. add the plpgsql language to your database
c. find the lwpostgis.sql file in your postgresql install (should be in share/contrib/ or share/contrib/postgis) and execute it.
d. (optional) add the spatial_ref_sys.sql to your database

You should be good to go from there. See the postgis install page (postgis.refractions.net) for more details.

Posted by dblasby at Apr 18, 2006 18:40

Ah, but I'm running SuSE 10.0 and postgresql defaults to SQL_ASCII on Linux, as you mentinoed above.  Note that I was using the 2005fe dataset, which appears to have possibly been updated, as when i made the post above puerto rico failed, but i have successfully loaded it now. 

 
a note on puerto rico, i had to create my db with a different encoding (WIN1252) to get your java programs to work and maintain the spanish characters.  a UTF8 encoding threw an "invalid UTF-8 byte sequence" error and SQL_ASCII just mangled the characters during loading since they were out of the ASCII range.  also, the java programs such as MajorRoads.java do not recognize the spanish equivalents to their U.S. counterparts (i.e. Expreso).  just a head's up!

Posted by [email protected] at Jun 06, 2006 15:49

I'm using a UTF-8 database, and used the following to get around the PR issue:

PGCLIENTENCODING=LATIN1 ogr2ogr...

Works great.

Daniel

Posted by daniel ceregatti at Jul 21, 2006 20:25
Document generated by Confluence on Jan 16, 2008 23:28